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PROGRAM LOADING MECHANISM 
THROUGH A SINGLE INPUT DATA PATH 

Backcfround of the Invention 

This invention relates to the loading of 
5 program code, data, and control information into a 
processing engine. More particularly, this invention 
relates to loading program code, data, and control 
information into a processing engine through a single 
data path. 

10 A processing engine processes input data to 

generate output data. A processing engine typically 
includes the following: a memory, a program counter, 
control logic, an execution pipeline, and a register 
file. The memory holds a stored program, which is a 

15 set of instructions stored in a block of memory. The 
program counter typically contains an address to an 
instruction in memory. After an instruction is read, 
the program counter is reloaded with the address of the 
next instruction to be read. Control logic decodes the 

20 current instruction, thereby controlling the execution 
pipeline. An instruction typically contains the 
following: one or more operands (e.g., contents of 
source registers or memory addresses, or constants) , an 
operation code (e.g., add, subtract, multiply, shift, 

25 load, store, branch, etc.), and a destination register 
or memory address for a resulting value. The execution 
pipeline computes output data from input data by 
performing arithmetic and logic operations defined by 



the decoded instruction. Each operation is preferably 
processed in multiple stages (e.g., a fetch stage, a 
decode stage, an execute stage, and a write-back stage) 
such that, where possible, the different stages can be 
5 overlapped to increase throughput (i.e., the rate at 
which temporary values or partial results are computed 
by the execution pipeline) . The register file 
typically holds temporary values or partial results 
computed by the execution pipeline. In addition, the 

10 register file can hold constants that are initialized 
before the program executes. The temporary values and 
constants can be changed from time to time during 
program execution. 

Processing engines of a known class typically 

15 do not have the ability to perform random access of 
their input data. Instead, those engines can only 
access (or read) a next piece of data from the input 
data stream. If no data is available, such processing 
engines stall until data becomes available. Logically, 

20 the input data stream can be considered the output of a 
first-in-first-out (FIFO) system. The data positioned 
first in the input data stream is processed first. 
Successive sequences of input data may need to be 
processed by different programs or by using different 

25 sets of constants in the register file. However, the 
memory and register file are often too small to 
accommodate all possible programs and constants 
simultaneously. As a result, either the memory, 
register file, or both will need to be reloaded in 

30 preparation for each sequence of input data. 

A typical sequence of operations required by 
known processing engines is shown in table (1) . 
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1 


Load a program into Memory 


2 


Load constants into the Register File 


3 


Initialize the Program Counter 


4 


Supply an arbitrary number of pieces of input data 


5 


Wait for all of the input data to be processed 


6 


Load new constants into the Register File 


7 


Initialize the Program Counter 


8 


Supply an arbitrary number of pieces of input data 


9 


Wait for all of the input data to be processed 


10 


Load a new program into Memory 


11 


Initialize the Program Counter 







Table (1) 



Different data paths are used for 

15 initializing the system and executing a stored program. 
To ensure that a data stream has been processed to 
completion before loading a new program or a new set of 
constants, initialization must be synchronized by 
external control logic. 

20 Known processing engines have several 

disadvantages. First, the use of multiple input data 
paths increases hardware complexity. Additional wiring 
and logic circuitry are needed. For example, 
multiplexers are needed to select whether 

25 initialization data or program data is to be sent to a 
register file. Also, multiple operations (e.g., run 
mode and setup mode) do not implement the same 
functions the same way. For example, the run mode 
(i.e., program mode) function of writing to a register 

30 file is typically different than the setup mode 

function of writing to a register file. In setup mode, 
an address is sent to a register file directly from an 



initialization path and a multiplexer. In run mode, an 
address is sent to the register file via control logic 
and the multiplexer. Second, the use of external 
control logic increases hardware and creates a 
5 dependency on an external source. External control 
logic is needed to synchronize the loading and 
execution of data through the multiple input data paths 
such that data is processed in the correct order at the 
appropriate time. The processing engine must function 

10 in the same timing domain as the external control logic 
-- the processing engine cannot operate in a separate 
timing domain nor can data be supplied to it at a 
sporadic rate. 

In view of the foregoing, it would be 

15 desirable to provide a processing engine with an 

efficient mechanism for loading program code, data, and 
control information. 

It would also be desirable to provide a 
processing engine that allows a program to be processed 

20 with little or no external control logic. 

Summary of the Invention 

It is an object of this invention to provide 
a processing engine with an efficient mechanism for 
loading program code, data, and control information. 

25 It is another object of this invention to 

provide a processing engine that allows a program to be 
processed with little or no external control logic. 

In accordance with this invention, a 
processing engine is presented with a single input data 

3 0 path. Program data and setup data are passed through 
the same input data path. To identify the input data 
as either setup data or program data, each piece of 
input data preferably contains one or more additional 
bits, known as identification bits. 
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The identification bits indicate whether the 
processing engine is in setup mode (associated with 
setup data) or run mode (associated with program data) . 
Initially after a reset, the processing engine 
5 typically runs in setup mode. In setup mode, control 
logic executes a setup program, which may make use of 
instructions and operations that are not typically 
available in run mode. In run mode, the processing 
engine executes instructions that are read from memory. 

10 When the input data changes from setup data to program 
data, the processing engine automatically switches from 
setup mode to run mode . Similarly, when the input data 
changes from program data to setup data, the processing 
engine automatically switches from run mode to setup 

15 mode. When input data is not available, the system 
enters a wait state in its current mode. 

Input data is passed through a single input 
data path into an execution pipeline. A piece of input 
data identified as program data is processed in the 

20 execution pipeline under the control of the current 

instruction that has been read from memory. A piece of 
input data identified as setup data is propagated 
through the execution pipeline via execution of a pass- 
through instruction. When setup data reaches the 

25 output data bus, control logic generates appropriate 
control signals to load the data into appropriate 
storage locations. Setup data can be decoded further 
from either additional identification bits or some 
subset of the input data. Any additional information 

30 supplied with a piece of input data can be part of an 
instruction sequence of the setup program. 



Brief Description of the Drawings 

The above and other objects and advantages of 
the invention will be apparent upon consideration of 
35 the following detailed description, taken in 



conjunction with the accompanying drawings, in which 
like reference characters refer to like parts 
throughout, and in which: 

FIG. 1 is a block diagram of a known multiple 
5 input data path processing engine; 

FIG. 2 is a block diagram of a single input 
data path processing engine in accordance with the 
present invent ion ; 

FIG. 3 is an illustration of a sample input 
10 data stream in accordance with the present invention; 
and 

FIG. 4 is a flow diagram illustrating the 
processing of program and setup data in a single input 
data path processing engine in accordance with the 
15 present invention. 

Detailed Description of the Invention 

The present invention provides a processing 
engine that loads program code, data, and control 
information through a single input data path. 

20 FIG. 1 shows a known multiple input data path 

processing engine 10 0 that has initialization 
paths 104, 106, 108, 110, and 112 for setup data and an 
input data path 102 for program data. These 
initialization paths are controlled by external control 

2 5 logic, which is not shown. A program can be loaded 

into a memory 122 via an initialization input data path 
(Dj„) 108 at an address specified from an initialization 
write address path (WrAddr) 110. Program instructions 
are typically stored sequentially in a block of 

30 memory 122 . Program execution can be started by 

loading a memory address of a first program instruction 
into a program counter (PC) 126 via initialization 
path 112, control logic 124, and path 150. Data (such 
as program constants) can be loaded into a register 

35 file 114 via initialization path 104, a 



multiplexer 116, and an input data path (Djn) 132. An 
address of register file 114 to which data is to be 
written can be specified using initialization path 106, 
a multiplexer 120, and a write address path (WpA^dr) 138. 
5 Once memory 122, register file 114, and 

program counter 12 6 are initialized, input data from 
path 102 is processed in an execution pipeline 118. An 
instruction is fetched from memory 122 at a current 
address of program counter 126 via path 146 . This 

10 program counter value is also sent to control logic 124 
via read address path (RdAbdr) 146. Control logic 124 
generates a new program counter value (by incrementing 
the current value or by calculating a branch 
destination) and sends the new value to program 

15 counter 126 via path 150. The fetched instruction is 
sent to control logic 124 via output data path 
(Dqut) 144, where the instruction is decoded to generate 
control signals for execution pipeline 118 via 
path 140. Control logic 124 can send a read address to 

20 register file 114 via a read address path (RdAddr) 136. 
This may be used to read a register value for use in 
execution pipeline 118. Control logic 124 can also 
send a write address to register file 114 via a 
path 142, multiplexer 12 0, and write address path 

25 (WrAddr) 138. This may be used to write an output from 
execution pipeline 118 into register file 114. Data 
from register file 114 can be sent to execution 
pipeline 118 via an output data path (Dqut) 130. Once 
the instruction has been processed, its result can be 

30 sent to an output data path 128. Data path 128 is used 
to transfer processed data out of processing 
engine 100. When control logic 124 determines that the 
current instruction will generate a processed data 
output, it signals this using an output data valid 

35 signal 152. Data path 128 is also used to transfer 
temporary or partially-processed data to register 
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file 114 via a path 134, multiplexer 116, and input 
data path {D^J 132. 

FIG. 2 shows a single input data path 
processing engine 2 00 in accordance with an embodiment 
5 of the present invention. Instead of an input data 
path for program data and a plurality of separate 
initialization paths for setup data as shown in FIG. 1, 
processing engine 2 00 uses a single input data 
path 202. Each piece of input data from input data 

10 path 202 includes additional identification bits. 

These additional identification bits are separated from 
the rest of the input data and transported along input 
data identification (ID) path 2 04. Although not shown, 
control logic can be used to read the identification 

15 bits (e.g., reading the most significant bits) and 
sending the identification bits to input data ID 
path 204. The identification bits may be one or more 
bits. They indicate whether a given piece of input 
data is setup data or program data, and may also 

20 contain other information related to that piece of 
input data. 

When the identification bits indicate that 
data is program data, control logic 124 signals 
execution pipeline 118 via path 140 to process the next 

25 instruction of a stored program residing in memory 122 
(via output data path (Dqut) 144, control logic 124, and 
path 140) . The instruction may be executed using any 
suitable pipelining method. Execution pipeline 118 
continues to execute instructions from the stored 

30 program (which may include reading values from register 
file 114 or writing values to the output of processing 
engine 200 via output data path 128) until control 
logic 124 reads the identification bit(s) of the next 
piece of data. The identification bit(s) of the next 

35 piece of data determines what mode processing 
engine 2 00 will operate in next. 



When the identification bits indicate that 
data is setup data, control logic 124 signals execution 
pipeline 118 via path 140 to execute a pass-through 
instruction. A pass- through instruction allows a piece 
5 of input data to propagate through to output data 

path 12 8 without modification. The setup data will be 
passed to output data path 128. Control logic 124 may 
perform additional decoding of the setup data either 
via path 204 or via path 210. 

10 Register file 114 can be loaded with setup 

data or program data. In both cases, register file 114 
receives input data directly from output data path 12 8 
via input data path (Di„) 206. A register file 114 
address to which input data can be written is sent from 

15 control logic 124 via write address path (WrAudr) 138. 

Data in register file 114 can be read under the control 
of a stored program: output data is driven into 
execution pipeline 118 via output data path (Dqut) 130. 
A register file 114 address from which output data can 

20 be read is sent from control logic 124 via a read 
address path (RdAddr) 208. 

Memory 122 can receive data from output data 
path 128 via path 210. For example, setup data may 
cause a program to be loaded into memory 122 . 

25 Memory 122 can also connect to program counter 126 via 
path 146. Path 146 can be used as a read address path 
(RdAdcr) (e.g., during run mode) or a write address path 
(WrAodr) (e.g., during setup mode) . Data can be sent 
from memory 12 2 to control logic 124 via output data 

30 path (Dout) 144. 

Program counter 12 6 is preferably used to 
access appropriate locations in memory 122 . A value in 
program counter 12 6 can be sent to memory 122 via 
path 146. The value in program counter 126 is either 

3 5 incremented by or replaced with a new counter address 
from control logic 124 via data path 150. 
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FIG. 3 illustrates a sample input data 
stream 3 00. Input data stream 300 includes multiple 
pieces of input data 312. Each piece of input data 312 
preferably contains an identification field 3 02, which 
5 is one or more bits, and a data field 3 06, which 
contains either program data or setup data. An 
identification bit of "0" may indicate, for example, 
setup data (setup mode) , while an identification bit of 
"1" may indicate program data (run mode) . 

10 A piece of input data that is identified as 

setup data can be decoded further either from 
additional identification bits 302 or from some subset 
of the input data itself. This allows each piece of 
input data to contain both a setup data value and a 

15 setup program instruction. The sequence of input data 
values that are identified as setup data act as the 
instruction sequence of the setup program, which is 
executed by the control logic in setup mode. 

The instructions of the setup program perform 

20 several important functions. One function is loading a 
program counter (PC) with a program memory address. 
The PC value is preferably supplied on the input data 
path. A second function is loading memory. A set of 
instructions are supplied on the input data path, and 

25 each instruction is loaded into memory at the current 
PC value. The PC is then incremented before a next 
instruction is loaded. A third function is loading a 
register file counter. The counter value can be 
supplied on the input data path. A fourth function is 

3 0 loading the register file. A value supplied on the 

input data path is loaded into the register file at the 
current value of the register file counter. The 
register file counter is then incremented before a next 
value is loaded. 

3 5 When any setup instruction is executed by 

control logic, the associated input data is propagated 
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through the execution pipeline via execution of a 
"pass -through" instruction. When the data reaches the 
output data path (which is the same data as that on the 
input data path) , control logic generates the 
5 appropriate control signals to load the data in the 
appropriate storage location. While the system is in 
setup mode, it may be necessary to inhibit exceptions, 
such as interrupts, because the system may be in a 
transitory state and therefore unable to process them. 

10 FIG. 4 illustrates the processing of program 

and setup data in a single input data path processing 
engine. Process 400 begins after the system has been 
reset. Because the "setup program" has little or no 
dependence upon the state of the processing engine at 

15 the time the setup program begins to execute, it can be 
used to set the initial state of the system. At 
step 404, the system enters setup mode. At step 406, 
process 400 determines whether a piece of input data is 
available, and if so, checks the identification bit. 

20 When the identification bit indicates setup data, 

process 400 moves to step 408 to process that piece of 
input data. When the piece of input data has been 
processed, process 400 moves back to step 406 to check 
for a next piece of input data. If no data is 

25 detected, process 400 moves to step 410, where the 

system waits in its current mode until a next piece of 
data is available. 

When the identification bit indicates program 
data, process 40 0 moves to step 412 where the system 

3 0 enters run mode. Process 400 then moves to step 414 
where it decodes a stored instruction at the current 
value of the program counter (PC) . At step 416, 
process 40 0 determines if the decoded instruction 
requires any input data. If the instruction does not 

35 require input data, process 400 moves to step 422, 

where the processing engine executes the instruction 
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and updates the PC (but does not process any input 
data) . If the execution of an instruction requires 
input data, process 400 moves to step 418. At 
step 418, process 400 examines the identification bit 
5 of a next piece of input data. If the piece of input 
data is program data, process 400 moves to step 422 
where the instruction is executed, the PC is updated, 
and the input data is processed. Process 400 then 
moves back to step 414. If data is not available, 

10 process 400 moves to step 42 0 where the system waits in 
its current mode. If the identification bit indicates 
setup data, process 400 moves back to step 404 without 
executing the decoded instruction. 

Thus, process 400 can run continuously, with 

15 the processing engine automatically switching between 
setup mode and run mode each time a change in 
identification bits is detected. In run mode, the 
processing engine executes instructions that are read 
from memory. In setup mode, control logic executes the 

2 0 setup program, which may make use of instructions and 

operations not available in run mode. 

The single input data path processing engine 
of the present invention has several advantages. 
Foremost, initialization information passes through a 

25 single data path. All programs, constants, and program 
data can be pre-processed into a single data stream, 
requiring no external control logic to manage the flow 
of data. Arranging the input data into a single data 
stream allows self - synchronization . A new program will 

30 not be loaded until all the data from a previous 

program (which precedes it in the data stream) has been 
processed by that previous program. Input data is 
analogous to the output of a first-in-first-out (FIFO) 
system. The source of the input data advantageously 

3 5 can be in a separate timing domain or can supply data 

at a sporadic rate without compromising the 
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synchronization between the different pieces of program 
data and setup data in the input data stream. Both 
control logic and data paths are simplified by reusing 
functions that exist for normal operation. For 
5 instance, control logic already has the ability to 
write to the register file and the program counter. 
Because data required for state changes passes down the 
execution pipeline, it remains ordered with respect to 
program data. This allows state changes to be 

10 pipelined, avoiding the inefficiency of draining the 
execution pipeline. Finally, the instructions of the 
setup program advantageously do not reduce the opcode 
space available to normal programs because these 
instructions are only available in setup mode. Because 

15 setup data and program data are sent to the execution 
pipeline as separate pieces of input data, the number 
of bits representing setup data and program data can be 
customized. 

Additional features or variations of the 

20 basic processing engine are also included in other 
embodiments of the present invention. For example, 
instead of reserving an identification field in the 
input data, one or more values that cannot normally 
occur in input data can be decoded and used to initiate 

25 the transition from run mode to setup mode. The same 
or a different value can be decoded and used to 
initiate the transition from setup mode to run mode. 

Also, instead of the described setup program 
which can include a number of different instructions 

30 along with the input data, a simple system according to 
the invention may require only a single instruction or 
a single inflexible sequence. The mode change from run 
mode to setup mode can be sufficient to trigger the 
execution of a "hard wired" instruction or setup 

35 program by the control logic. 
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Alternatively, a system may require a highly 
complex setup program. In this case, the program 
according to the present invention can be stored in a 
separate read-only memory (ROM) and executed under the 
5 control of a dedicated setup program counter. This 
operation is similar to the behavior of a multi- 
threaded processor. 

A "setup program" instruction can also be 
provided that propagates an input value straight 

10 through the execution pipeline to the output data path. 
This requires the addition of output data ID signals 
generated by control logic in accordance with the 
current operation mode. This allows appropriately 
tagged values to exist in the input data stream 

15 arbitrarily interleaved with program data. These 

special data values reach the output in a way that is 
asynchronous (relative to program execution) , but have 
a pre-defined ordering relative to computed output 
values. Additional outputs from control logic, which 

20 are valid at the same time as the output data 

valid signal, can indicate whether a piece of output 
data is a value generated by the execution pipeline or 
propagated through the pipeline by a setup program. 

Thus it is seen that a processing engine is 

25 provided for loading code, data, and control 

information through a single input data path with less 
complex hardware. One skilled in the art will 
appreciate that the present invention can be practiced 
by other than the described embodiments, which are 

3 0 presented for purposes of illustration and not of 

limitation, and the present invention is limited only 
by the claims which follow. 



